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1 . A computer assisted/implemented method for developing a classifier for 
classifying communications comprising the steps of: 

(a) presenting communications to a user for labeling as relevant or irrelevant, the 
communications being selected from groups of communications including: 

a training set group of communications, the training set group of 
communications being selected by a traditional active learning algorithm; 

a system-labeled set of communications previously labeled by the system; 

a test set group of communications, test set group of communications for 
testing the accuracy of a current state of the classifier being developed by the present method; 

a faulty set of communications suspected to be previously mis-labeled by 

the user; and 

a random set of communications previously labeled by the user; 

(b) developing a classifier for classifying communications based upon the 
relevant/irrelevant labels assigned by the user during the presenting step. 



2. The method of claim 1 , wherein the presenting step includes the steps of: 

assessing a value that labeling a set of communications from each group will 

provide to the classifier being developed; and 

selecting a next group for labeling based upon the greatest respective value that 

will be provided to the classifier being developed from the assessing step. 
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3 . A computer assisted/implemented method for developing a classifier for 
classifying communications (text, electronic, etc.) comprising the steps of: 

(a) presenting communications to a user for labeling as relevant or irrelevant, the 
communications being selected from groups of communications including: 

a training set group of communications, the training set group of 
communications being selected by a traditional active learning algorithm; 

a test set group of communications, test set group of communications for 
testing the accuracy of a current state of the classifier being developed by the present method; 
and 

a previously-labeled set of communications previously labeled by at least 
one of the user, the system and another user; 

(b) developing a classifier for classifying communications based upon the 
relevant/irrelevant labels assigned by the user during the presenting step. 

4. The method of claim 3, wherein the previously-labeled set of communications 
includes communications previously labeled by the user. 

5. The method of claim 4, wherein the previously-labeled set of communications 
includes communications suspected by the system to be possibly mis-labeled by user. 

6. The method of claim 3, wherein the previously-labeled set of communications 
includes communications previously labeled by the system. 

7. The method of claim 3, wherein the previously-labeled set of communications 
includes communications previously labeled by a user and communications previously labeled 
by the system. 
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8. The method of claim 3 wherein the presenting step includes the steps of: 
assessing a value that labeling a set of communications from each group will 

provide to the classifier being developed; and 

selecting a next group for labeling based upon the greatest respective value that 
will be provided to the classifier being developed from the assessing step. 

9. The method of claim 3 wherein the presenting step includes the steps of: 
assessing a value that labeling a set of communications from each group will 

provide to the classifier being developed; and 

selecting a next group for labeling based upon the achieving known performance 
bounds for the classifier. 

1 0. The method of claim 3 further comprising the step of developing an expression of 
labeling criteria in an interactive session with the user. 

1 1 . The method of claim 10, wherein the interactive session includes the steps of 
posing hypothetical questions to the user regarding what type of information the user would 
consider relevant. 

12. The method of claim 11, wherein the hypothetical questions elicit "yes", "no" and 
"unsure" responses from the user. 

13. The method of claim 1 1 wherein subsequent questions are based, at least in part, 
upon the answers given to previous questions. 

14. The method of claim 1 1 wherein the step of developing an expression of labeling 
criteria produces a criteria document. 
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15. The method of claim 14 wherein the expression and/or the criteria document 
include a group of keywords and/or phrases for use by the system in automatically labeling 
communications. 

16. The method of claim 10 wherein the step of developing an expression of labeling 
criteria produces a criteria document. 

17. The method of claim 16 wherein the criteria document includes a list of items that 
are considered relevant and a list of things that are considered irrelevant. 

18. The method of claim 17, wherein the presenting step (a) includes the step of 
querying the user which item(s) influenced the label on a user-labeled communication. 

19. The method of claim 10 wherein the expression and/or the criteria document 
include a group of keywords and/or phrases for use by the system in automatically labeling 
communications. 

20. The method of claims 10 wherein the interactive session is conducted prior to the 
presenting step (a). 

21 . A computer assisted/implemented method for developing a classifier for 
classifying communications (text, electronic, etc.) comprising the steps of: 

(a) developing an expression of labeling criteria in an interactive session with the 

user; 

(b) presenting communications to a user for labeling as relevant or irrelevant; and 

(c) developing a classifier for classifying communications based upon the 
relevant/irrelevant labels assigned by the user during the presenting step; 

wherein at least one of the presenting step (b) and the developing step (c) use the 
expression of labeling criteria developed in the developing step (a). 
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22. The method of claim 21, wherein the interactive session includes the steps of 
posing questions to the user regarding what type of information the user would consider relevant. 

23. The method of claim 22, wherein the questions elicit "yes", "no" and "unsure" 
responses from the user. 

24. The method of claim 21 wherein subsequent questions are based, at least in part, 
upon the answers given to previous questions. 

25. The method of claim 21 wherein the questions are structured from several 
dimensional levels of relevance, including a first dimension of question segments on a topic, a 
second dimension of question segments on an aspect of the topic and a third dimension of 
question segments on a type of discussion. 

26. The method of claim 25, wherein: 

the first dimension of question segments on a topic include one or more of the 
following segments: a segment concerning a client's product and a segment concerning a client's 
competitors; 

the second dimension of question segments on a topic include one or more of the 
following segments: a segment concerning a feature of the first segment, a segment concerning 
the first segment itself, a segment concerning corporate activity of the first segment, a segment 
concerning price of the first segment, a segment concerning news of the first segment and a 
segment concerning advertising of the first segment; 

the third dimension of question segments on a topic include one or more of the 
following segments: a segment concerning a mention of the second dimension segment, a 
segment concerning a description of the second dimension segment, a segment concerning a 
usage statement about the second dimension segment, a segment concerning a brand comparison 
involving the second dimension segment, and a segment concerning an opinion about the second 
dimension segment. 
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27. The method of claim 21 wherein the step of developing an expression of labeling 
criteria produces a criteria document. 

28. The method of claim 27 wherein the criteria document includes a list of items that 
are considered relevant and a list of items that are considered irrelevant. 

29. The method of claim 28 wherein the criteria document includes a group of 
keywords for use by the system in automatically labeling communications. 

30. The method of claim 28, wherein the presenting step (b) includes the step of 
querying the user which items influenced the label on a user-labeled communication. 

3 1 . The method of claim 21 wherein the expression of labeling criteria includes a 
group of keywords and/or phrases for use by the system in automatically labeling 
communications. 

32. The method of claim 31 wherein the group of keywords is also for use by the 
system in a step of gathering communications. 

33. A computer assisted/implemented method for developing a classifier for 
classifying communications (text, electronic, etc.) comprising the steps of: 

(a) defining a domain of communications on which the classifier is going to 

operate; 

(b) collecting a set of communications from the domain; 

(c) eliciting labeling communication criteria from a user; 

(d) labeling, by the system, communications from the set of communications 
according, at least in part, to the labeling communication criteria elicited from the user; 

(e) labeling, by the user, communications from the set of communications; 

(f) building a communications classifier according to a combination of labels 
applied to communications in labeling steps (d) and (e). 
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34. The computer implemented method of claim 33, wherein the combination of the 
labeling steps (d) and (e), and the building step (f) includes the step of selecting communications 
for labeling by the user targeted to build the communications classifier within known 
performance bounds. 

35. The computer implemented method of claim 34, wherein the selecting step selects 
communications from groups of communications including: 

a training set group of communications, the training set group of communications 
being selected by a traditional active learning algorithm; 

a test set group of communications, test set group of communications for testing 
the accuracy of a current state of the classifier being developed by the present method; and 

a previously-labeled set of communications previously labeled by at least one of 
the user, the system and another user. 

36. The computer implemented method of claim 34, wherein the selecting step selects 
communications from groups of communications including: 

a training set group of communications, the training set group of communications 
being selected by a traditional active learning algorithm; 

a system-labeled set of communications previously labeled by the system; 

a test set group of communications, test set group of communications for testing 
the accuracy of a current state of the classifier being developed by the present method; 

a faulty set of communications suspected to be previously mis-labeled by the user; 

and 

a random set of communications previously labeled by the user. 

37. The computer implemented method of claim 33, wherein the communication 
criteria elicited in the eliciting step (c) is used, in part, to determine communications to collect in 
the collecting step (b). 
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38. The computer implemented method of claim 37, wherein the eliciting step (c) 
involves an interactive session with the user. 

39. The computer implemented method of claims 37, wherein the communication 
criteria elicited in the eliciting step (c) is used, in part, by the system to label communications in 
the labeling step (d). 

40. The computer implemented method of claim 39, wherein the eliciting step (c) 
involves an interactive session with the user. 

41 . The method of claim 33, wherein the building step (f) involves an active learning 
process. 

42. The computer implemented method of claims 33, wherein the communication 
criteria elicited in the eliciting step (c) is used, in part, by the system to label communications in 
the labeling step (d). 

43. The computer implemented method of claim 33, wherein the eliciting step (c) 
involves an interactive session with the user. 

44. The method of claim 43, wherein the interactive session includes the steps of 
posing questions to the user regarding what type of information the user would consider relevant. 

45. The method of claim 44, wherein the interactive session also allows the user to 
provide keywords based upon a criteria the user considers relevant. 

47. The method of claim 44, wherein the questions elicit "yes", "no" and "unsure" 
responses from the user. 
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48. The method of claim 43 , wherein the building step (f) involves an active learning 
process. 

49. A computer memory containing a software program including instructions for 
implementing a method for developing a classifier for classifying communications (text, 
electronic, etc.) comprising the steps of: 

(a) presenting communications to a user for labeling as relevant or irrelevant, the 
communications being selected from groups of communications including: 

a training set group of communications, the training set group of 
communications being selected by a traditional active learning algorithm; 

a test set group of communications, test set group of communications for 
testing the accuracy of a current state of the classifier being developed by the present method; 
and 

a previously-labeled set of communications previously labeled by at least 
one of the user, the system and another user; 

(b) developing a classifier for classifying communications based upon the 
relevant/irrelevant labels assigned by the user during the presenting step. 

50. A computer memory containing a software program including instructions for 
implementing a method for developing a classifier for classifying communications (text, 
electronic, etc.) comprising the steps of: 

(a) developing an expression of labeling criteria in an interactive session with the 

user; 

(b) presenting communications to a user for labeling as relevant or irrelevant; and 

(c) developing a classifier for classifying communications based upon the 
relevant/irrelevant labels assigned by the user during the presenting step; 

wherein at least one of the presenting step (b) and the developing step (c) use the 
expression of labeling criteria developed in the developing step (a). 
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51. A computer system memory containing a software program including instructions 
for implementing a method for developing a classifier for classifying communications (text, 
electronic, etc.) comprising the steps of: 

(a) defining a domain of communications on which the classifier is going to 

operate; 

(b) collecting a set of communications from the domain; 

(c) eliciting labeling communication criteria from a user; 

(d) labeling, by the computer system, communications from the set of 
communications according, at least in part, to the labeling communication criteria elicited from 
the user; 

(e) labeling, by the user, communications from the set of communications; 

(f) building a communications classifier according to a combination of labels 
applied to communications in labeling steps (d) and (e). 
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